Classifying Reddit comments by subreddit

نویسنده

  • Jee Ian Tam
چکیده

Reddit.com is a website that is primarily organized by communities called subreddits, where users can post comments to. As subreddits can have very different cultures, we aim to classify comments by subreddit as a means of sentiment analysis. We use a publicly available reddit comment dataset over the year of 2016 and perform a classification on a selection of 20 subreddits among the top 50 by total comment word counts. We investigate the Recurrent Convolutional Neural Network (CNN) and Attention-Pooled CNN as more advanced models in the classification task. We find that the Recurrent CNN with GRU performs the best out of all the models tested, achieveing an F1 score of 0.53 and a Cohen’s Kappa score of 0.502 . Out of all the subreddits, we find that the Recurrent CNN model gives the highest F1 score on the ”anime” subreddit. Quantitative and qualitative analysis of the Recurrent CNN with GRU suggests that the model is able to reasonably approach the task. Further work suggests that the trained model can also be used to cluster subreddits based on comment similarity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

"I feel like I've hit the bottom and have no idea what to do": Supportive social networking on Reddit for individuals with a desire to quit cannabis use.

BACKGROUND Online communities can provide social support to those in need and can foster personal empowerment for individuals experiencing distress. This study examines the content of a Reddit community dedicated to the support of people trying to quit using cannabis, in order to develop an understanding of the type of social networking occurring on this subreddit (e.g., community). METHODS A...

متن کامل

A Statistical Analysis of Network Data from Reddit

Network structures are everywhere, from social networks to health epidemics. When making statistical models, it is important to be able to account for a network structure, since network data violates the assumption of independence that a regular linear model requires. In this paper, we explore descriptive statistics of networks, purely mathematical models of networks, and Exponential Random Gra...

متن کامل

Classifying Reddit Submissions CS 229A Final Project

This project concerns itself with user submissions to the social news site Reddit. Each post to this site must be made to a particular subreddit, which can be thought of as a message board for a collection of users interested in a (possibly loosely defined) topic. The largest subreddits are generally oriented around a single dimension of the posts which can either be content-based, like the sub...

متن کامل

A Conversation with Nathan Allen

I t all started when Nathan Allen was stuck in the lab, babysitting his experiments for hours at a time. The 10or 15-minute blocks between checking on his reactions weren’t long enough to dig into anything substantial. Instead, he recalls, “I posted snarky comments on the internet, which you can do in 5 minutes.” Soon, Allen was answering science questions on the Web site Reddit, where he saw p...

متن کامل

A Look Into the World of Reddit with Neural Networks

Creating, placing, and characterizing social media comments and reactions is a challenging problem. This is particularly true for reddit.com, a highly trafficked social media website with thousands of posts per day. Each post has an associated comment thread, and users of Reddit can vote the comments up or down, generating a net score, or ”Karma,” for each comment. Users aspire to collect this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017